Unsupervised Resource Creation for Textual Inference Applications
نویسندگان
چکیده
This paper explores how a battery of unsupervised techniques can be used in order to create large, high-quality corpora for textual inference applications, such as systems for recognizing textual entailment (TE) and textual contradiction (TC). We show that it is possible to automatically generate sets of positive and negative instances of textual entailment and contradiction from textual corpora with greater than 90% precision. We describe how we generated more than 1 million TE pairs – and a corresponding set of 500,000 TC pairs – from the documents found in the 2 GB AQUAINT-2 newswire corpus.
منابع مشابه
Without a "doubt"? Unsupervised Discovery of Downward-Entailing Operators
An important part of textual inference is making deductions involving monotonicity, that is, determining whether a given assertion entails restrictions or relaxations of that assertion. For instance, the statement ‘We know the epidemic spread quickly’ does not entail ‘We know the epidemic spread quickly via fleas’, but ‘We doubt the epidemic spread quickly’ entails ‘We doubt the epidemic spread...
متن کاملA Test Suite for Inference Involving Adjectives
Recently, most of the research in NLP has concentrated on the creation of applications coping with textual entailment. However, there still exist very few resources for the evaluation of such applications. We argue that the reason for this resides not only in the novelty of the research field but also and mainly in the difficulty of defining the linguistic phenomena which are responsible for in...
متن کاملKnowledge-Based Textual Inference via Parse-Tree Transformations
Textual inference is an important component in many applications for understanding natural language. Classical approaches to textual inference rely on logical representations for meaning, which may be regarded as “external” to the natural language itself. However, practical applications usually adopt shallower lexical or lexical-syntactic representations, which correspond closely to language st...
متن کاملAnnotating Lexically Entailed Subevents for Textual Inference Tasks
This paper presents a procedure for constructing an Event Structure Lexicon (ESL), a resource which represents the lexically-entailed subevents in text as a support for textual inference tasks. The ESL is used as a resource for a subevent markup algorithm, called SUBEVITA, which annotates event implicatures on top of TimeML-based extraction algorithms. Such a resource can be used independently ...
متن کاملLEDIR: An Unsupervised Algorithm for Learning Directionality of Inference Rules
Semantic inference is a core component of many natural language applications. In response, several researchers have developed algorithms for automatically learning inference rules from textual corpora. However, these rules are often either imprecise or underspecified in directionality. In this paper we propose an algorithm called LEDIR that filters incorrect inference rules and identifies the d...
متن کامل